perm filename RED.LM3[LSP,JRA]3 blob sn#225085 filedate 1976-07-08 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00005 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	Thus the terminology "solvability" is apropos. Given ?M, an equation of the form
C00005 00003	These reduced forms of ?M1, ?M2, ?M3, and ?M4 all have the general form that:
C00013 00004	.SS(Some background on lattices)
C00025 00005	.begin "at|]" 
C00028 ENDMK
C⊗;
Thus the terminology "solvability" is apropos. Given ?M, an equation of the form
?M?X1#...#?Xk#=#?X can be solved for all terms ?X iff ?M is solvable.
We define a term with free variables as being solvable if there is a substitution 
of closed terms for its free variables such that the resulting closed term 
is solvable.

.GROUP;
Definition:
.BEGIN INDENT 5,5;
A term ?M is %3⊗→solvable↔←%1 iff there is a head context⊗↓Definition
{yon(R18)}.← ∪C[#]∩ such that ∪C[M]∩ has a normal form.
.end
.apart

The concept of %3head normal form%1 provides a syntactic characterization
of the solvable terms. Consider again the unsolvable terms ?M1#≡#⊗d⊗d and
?M2#≡#⊗d∪(λx.xxx)∩, where ⊗d#≡#∪(λx.xx)∩.
Inspecting the reductions of these terms we find:

.begin indent 5,7;fill;
1) For ?M1 ≡ ⊗d⊗d, ?M1 reduces only to itself.

2) For ?M2 ≡ ⊗d∪(λx.xxx)∩, letting ∪T ≡ (λx.xxx)∩, every term to which ?M2
reduces is of the form ∪((λx.xxx)TT)T#...#T∩ where ∪T∩ appears some finite
number of times (≥1).

.nofill
Consider also:
3) ?M3 ≡ ⊗d∪(λx.λy.xx)
       $r ∪(λx.xx)(λx.λy.xx)
       $r ∪(λx.λy.xx)(λx.λy.xx)
       $r ∪λy.(λx.λy.xx)(λx.λy.xx)
       $r ∪λy.λ?y1∪.(λx.λy.xx)(λx.λy.xx)
       $r ∪λy.λ?y1∪.λ?y2∪.(λx.λy.xx)(λx.λy.xx)∩
	etc.
Every term to which ?M3 reduces is of the form (after ?α-conversion):
	 ∪λy.λ?y1∪.λ?y2∪....?yn∪.(λx.λy.xx)(λx.λy.xx)∩

4) ?M4 ≡ ⊗d∪(λx.λy.xxx)∩, denoting ∪(λx.λy.xxx)∩ as ∪T∩', we have
       $r ∪T'T'
       $r ∪λy.(λx.λy.xxx)T'T'
       $r ∪λy∪.(λ?y1∪.(λx.λy.xxx)T'T')T'
	 ...∩

Every term to which ?M4 reduces is of the form (after ?α-conversion)
	∪λ?x1∪.(λ?x2∪.(...(λ?xn∪.(λx.λy.xxx)T'T') ...)T')T'∩
.end
These reduced forms of ?M1, ?M2, ?M3, and ?M4 all have the general form that:
either they consist of a single ?β-redex, or of a ?β-redex followed by a
finite number of arguments, or of a finite number of abstractions on one of these.
That is, they are of the form:
.begin center;

(%74%1)   ∪λ?x1?x2#...#?xn∪.((λx.M)N)?X1?X2 ... ?Xm∩ n≥0,m≥0.
.end
Such a term is called %3 a term ⊗→not in head normal form↔←%1 and the redex
∪(λx.M)N∩ is called its %3head redex%1.

The "paradoxical combinator" ∪Y%4λ%d ≡ λf.(λx.f(x x))(λx.f(x x))∩ is of the form
%74%1, but can be reduced to the term
∪λf.f((λx.f(x x))(λx.f(xx)))∩ which is not of the form %74%1. Also the terms:
.begin tabit1(30);select d;
\M ≡ λx.x(K)(A)
\N ≡ λx.x(KI)(A),
.end
where ?A is any term without a normal form are terms which have no normal forms, and
are not of  form %74%1. These three terms are not in normal form, but they are in
%3⊗→head normal form↔←%1#(⊗→h.n.f.↔←); that is, they are of the general form:
.begin center;select d;

(%75%d) λ?x1?x2 ...?xn∪.z∩ ?X1?X2 ...?Xk
.end
where n≥0, k≥0, ?z is a variable, and ?Xi are arbitrary terms.
Every term can be written in one of the forms %74%1 or %75%1 for suitable
n, k, ?Xi.

In %75%1 the variable ?z is known as the %3⊗→head variable↔←%1, and  the term ?Xi
as its ⊗→%3i%8th%3 main argument↔←%1. Two head normal forms
are said to be %3⊗→similar↔←%1 if they have the same head variable, the same number
of main arguments, and the same number of initial bound variables.
A term ?H in h.n.f. is said to be a %3head normal form of ?M%1 if ?M#$c#?H. 

The
reduction of ?M which always reduces the head redex is called the %3⊗→head reduction↔←%1.
Analogously to normal  order reduction of terms to normal form, it
can be shown that a term has a head normal form iff its head
reduction terminates, and all head normal forms of a term are similar.
Thus we use head   reduction as  as  our computation scheme, being
interested only in whether or not a term has h.n.f. and not caring
about a normal form.
We shall see that it is consistent to equate all terms not having h.n.f.. We
know that computation will terminate precisely for those terms having a h.n.f., 
however, we have no decision procedure for recognizing whether or not an arbitrary
term has a h.n.f.

Comparing %75%1 with the structure of normal forms discussed earlier ({yon(R14)}),
we can see that in a head normal form the main arguments, ?X1,#...,#?Xm
can be arbitrary terms, whereas a normal form requires that these
subterms also be in normal form. Thus a head normal form is a "weak" kind
of normal form which is in normal form only "at the first level".

It is easy to see that terms having a head normal form are solvable⊗↓The proof
of lemma 3, ({yon(R19)}), requires only that ?M have a %2head%1 normal form.←.
The converse can also be shown by syntactic means though we shall see a much easier
proof using Scott's lattice models⊗↓Theorem 17, Section 5.4←.

Thus we see that the unsolvable terms provide a more satisfactory analogue of
non-terminating programs than do terms not having a normal form.  Basically, the
reason for this is that unsolvability is preserved under application and composition,
that is, if ∪U∩ is unsolvable then so are ∪UX∩ and ∪U⊗x∪X∩⊗↓"?U composed
with ?X"← for all ?X, corresponding
to our intuition about non-terminating programs; that if, for example,
we perform two programs ⊗P1 and ⊗P2 in sequence and if ⊗P1 fails to terminate then
so does the sequence ⊗P1;⊗P2.
This behavior is not modelled by terms failing to have
normal forms, because, in general, the property of not having a normal form
is not preserved by application or by composition.

It was first shown by Barendregt that it is consistent to identify
all unsolvable terms⊗↓We can show this using Scott's models.←. From
a computational point of view it is certainly desirable to do so since they
behave alike.

We have considered what happens if terms are applied as functions; 
what if functions are applied to them? Even  for unsolvable terms we
can always obtain a normal form as result by using a constant function.
For example if ∪U∩ is unsolvable:
.begin center
∪(KI)U $r∪ I∩
.end
 But this method of using ?U as a subterm and obtaining a normal form
as a result is trivial, since ∪(KI)M#$r#∪I∩ for all terms ?M. In fact,
we can make a stronger statement.

%7LEMMA 4%1 If ?U is unsolvable, then, for all contexts ∪C[#]∩,
.begin indent 5,5
a) If ∪C[U]∩ has a normal form, then ∪C[M]∩ has the same normal form for all ?M.

b) If ∪C[U]∩ has a head normal form, then ∪C[M]∩ has a similar 
head normal form for all ?M.
.end

So the interpretation of unsolvable terms as being
"totally undefined" or "least defined" and as "containing no information" is a good one.
It should not be suprising therefore when we find that the unsolvable terms are exactly the terms
with value ⊗b is Scott's  model.
.SS(Some background on lattices)
Programs cannot compute with arbitrary infinite objects. The mathematical
entities manipulated by programs must, in some sense, be determined by their
finite approximations; otherwise there would be no effective way of carrying out
the computation. To begin to see how we can build a theory of finite
approximation appropriate for the project of mathematical semantics, we argue
step-by-step as follows:
.begin indent 5,5;fill;
1) Mathematical semantics requires (some) "infinite" objects having no explicit finite
representation.

2) Infinite objects can never be presented or "computed" entirely (i.e., in finite
time), but one can determine "approximations" to what they really are.

3) Approximations are "partial objects" which only give some of the information
about the data object.

4) For an infinite object to stand a chance of being computable, it must contain
no information over and above that which is contained in its finite
(i.e., finitely representable) approximations.
.end
These observations lead to the "⊗→enumeration model↔←" of computation: a data object πx
can be computed by any "program" which enumerates a sequence of approximations whose
total "information content" is πx.

Thus data spaces of interest for a theory of computation should be "richer" 
than the conventional notion of a data type; in addition to the usual "perfectly
defined" elements (integers, total functions, etc.) our data spaces
should also contain both partial (i.e. partially defined) objects
and infinite objects as "limits" of partial objects. We use the term
%3⊗→domain↔←%1 to refer to such data spaces. We have a relation of ⊗→approximation↔←
defined on these domains; we write:
.begin center;
⊗→%2x %f≤ %2y↔←%1
.end
(read: πx "approximates" πy, or πx is less than or equally defined as πy) to mean,
intuitively, that πy contains all the information that πx does.
Clearly, this relation is reflexive, transitive, and anti-symmetric, and thus,
is a partial order on our domain.

To combine the information given by a set πX of approximations we introduce
the operation ⊗→%fl%2X↔←%1, the %3⊗→join↔←%1 of the set πX, which is the least upper bound of 
πX under the partial ordering.
It must contain the information of all elements of πX (i.e., it must be an
upper bound), and it should not contain any further information (i.e., it
must be the least upper bound).

Our domain is a ⊗→complete lattice↔← under the partial ordering π≤, which means
that πlπX exists for all subsets πX of the domain. We can define
some other lattice-theoretic concepts in terms of πl as follows:

.BEGIN "A" TURN OFF "{","}";
.BEGIN CENTER
πgπX = πl{πw : πwπ≤πx for all πxεπX}
.end
.BEGIN  TURN On "{","}";
The %3⊗→meet↔←%1 of the set πX⊂πD⊗↓"##" is a notation for 
"is a subset of".←, weakens the information of the elements of πX
giving a greatest lower bound.
.end
.begin center;

⊗→%fu↔←%1 = πl{ }  (or πgπD)
.end
"⊗→Bottom↔←" gives no information, is totally undefined.
.begin center;

πt = πlπD   (or πg{ })
.end
"⊗→Top↔←" gives too much information, it is overdefined to the point of being
inconsistent.
Note that ∀πxεπX,#πxπ≤πt and ⊗bπ≤πx.
.begin tabit1(15);group;

\πxπllπy = πl{πx, πy}
\πxπlgπy = πg{πx, πy}
\If πxπllπy = πt, then πx and πy are incomparable, i.e., not#πxπ≤πy and not#πyπ≤πx.
\If πxπlgπy = ⊗b, then πx and πy are incomparable, 
.end
.end "a"

We define a function πf:#πD#→#πD to be %3⊗→continuous↔←%1 if 
f(πlπX)#=#πlπf(πX). If πf is continuous, πxπ≤πy#=>πf(πx)π≤πf(πy).

.R22:
.group;
The lattice theoretic ⊗→least fixpoint operator↔← ⊗→%2Y↔←%1 is defined:
.begin turn off "{","}";
.begin center;

πY(πf) = πL{πf%8n%1}
.end
and produces the least fixed points of contiouous functions.
A recursive function πf is typically defined to be some function, %gt%1, of the
identifier πf.  We define the 
sequence {πf%8n%1} by: πf%80%1#=#%gt%1(⊗b), πf%8n+1#=#%gt%1(πf%8n%1).
.end
.apart

The join (l.u.b.) πlπX of a set πX may properly be called the %3⊗→limit↔←%1
of the set  only when πX is %3directed%1.  A subset πX of a domain is a 
%3⊗→directed set↔←%1 iff every finite subset of πX has an upper bound in πX.
A directed set is %3interesting%1 if it does not contain its own least
upper bound.

A particular case of a directed set which it is generally sufficient to consider
is an infinite sequence of elements which form a chain.  A sequence πx0, πx1, πx2, ...
forms a %3⊗→chain↔←%1 iff πx0π≤πx1π≤πx2π≤...
.begin turnoff "{", "}";
For chains, we generally write###πLπxn#for#πl{πxn#|#n≥0}
.end

The following properties are derived from these definitions:
.begin indent 7,7;nofill;group
L1. A set πX is directed iff it is non-empty and every pair of elements in πX has an upper 
    bound in πX.
L2. All interesting directed sets are infinite.
L3. Every countably infinite directed set contains a chain with the same limit.

.end
An element πx of a domain πD is a %3⊗→limit point↔←%1 iff πx = πlπX for some 
interesting directed πX⊂πD.

We introduce the idea of a basis since, as mentioned before, we
want to restrict our attention to domains in which all elements are determined
by their finite (that is, finitely representable) approximations.  We can think
of a finite approximation as one which contains a finite amount of information.

Let πE⊂πD be the set of finitely representable elements of πD.  We wish to say 
that every πxεπD is the join of some set of finitely representable elements;
that is, there is some πE'⊂πE such that πx#=#πlπE'.  But when this holds, we
can include any element of πE which is π≤πx in πE' without changing its join, so,
for all πxεπD,
.begin turnoff "{", "}"; center

(*)         πx = πl{πe | πeεπE and πeπ≤πx}

.end
A set πE⊂πD such that (*) holds for all πxεπD is called a %3⊗→sub-basis↔←%1 of 
πD.  We wish that πE consist only of finitely representable elements,
and be "effectively given" (an algorithm can be described to generate its
elements), which
requires that πE must be countable.  In fact, it is convenient to take this
requirement a step further.  Motivated by our intuition that the join of a
finite number of finite approximations still contains only a finite amount
of information, we define a basis.

A subset πE⊂πD is a %3⊗→basis↔←%1 of πD iff it is a sub-basis and, for all finite
πE'⊂πE, πlπE'επE.  
.begin turnoff "{", "}";
Since the set  {πlπE' | πE'⊂πE, πE' finite} of finite joins of sub-basis
elements is countable whenever the sub-basis itself is countable, we
postulate that:  
.begin center;
Every domain has a countable basis.
.end
.end
This axiom effectively limits the size of a domain.  Since there are at most a
continuum number of subsets of a countable set, this implies that all domains
we consider have cardinality no larger than that of the real numbers.

The following properties are consequences of the definitions just given:  Let
πE be a basis for πD, then
.begin indent 10,10;nofill;turnoff "{", "}";
B1.  For all πxεπD,πx ≠ πt, the set {πe | πeεπE and πeπ≤πx} is directed.
B2.  If πxεπD is not a basis element, then πx is a limit point.
B3.  When πE is countable, every limit point πxεπD can be expressed as the limit of a
     chain of basis elements.
.end


.begin "at|]"; 
.at "|"⊂"%f)%1"⊃
.ss(The Models)
.begin indent 15, 15;
"The lambda-calculus is a curious language; one of my aims in life is to
make it somewhat less curious."
.nofill;
							D. Scott

.fill;
.end
We assume the existence of a complete lattice ⊗→%dD%4∞%1↔←, with more than one
element, which is isomorpic to its own continuous function space [$D→$D].  (For
construction of the models represented by $D see Wadsworth [15] or Scott#[13].)
  We can express this isomorphism by two continuous functions
.begin center;

$f : $D → [$D → $D]
and
$y : [$D → $D] → $D
.end
.begin indent 10,20; nofill;
with the properties:
1) $y($f(πx)) = πx,    for all πxε$D
2) $f($y(πf)) = πf,    for all πfε[$D → $D]
.end

To interpret the ?λ-calculus in $D, we must first introduce some notation.
.begin nofill; tabit2 (15, 25);
\⊗→%3Id%1↔←\for the set of all identifiers (variables)
\⊗→%3Exp%1↔←\for the set of all ?λ-terms
\⊗→%3Env%1↔←\for the set of all environments
.end
and we shall use the meta-variable πd for a typical element of $D.  By an
%3⊗→environment↔← ⊗→%gr%1↔← we mean an association of values ("denotations") from $D
with all identifiers:
.begin center;
πr : $I → $D
.end
thus, $N is the set of all functions from $I to $D.  Given an environment in which
to interpret its free variables, the meaning of a term will be determined as
a composite of that of its immediate subterms by a mapping ⊗→%6V%1↔← which takes
a term as argument and produces a function from $N to $Das result:

.begin center;
πV : $X → [$N →$D]⊗↓Sch⊗onfinkeling again!←
.end